Maximum Entropy based Rule Selection Model for Syntax-based Statistical Machine Translation
نویسندگان
چکیده
This paper proposes a novel maximum entropy based rule selection (MERS) model for syntax-based statistical machine translation (SMT). The MERS model combines local contextual information around rules and information of sub-trees covered by variables in rules. Therefore, our model allows the decoder to perform context-dependent rule selection during decoding. We incorporate the MERS model into a state-of-the-art linguistically syntax-based SMT model, the treeto-string alignment template model. Experiments show that our approach achieves significant improvements over the baseline system.
منابع مشابه
Improving Statistical Machine Translation using Lexicalized Rule Selection
This paper proposes a novel lexicalized approach for rule selection for syntax-based statistical machine translation (SMT). We build maximum entropy (MaxEnt) models which combine rich context information for selecting translation rules during decoding. We successfully integrate the MaxEnt-based rule selection models into the state-of-the-art syntax-based SMT model. Experiments show that our lex...
متن کاملA Continuous Space Rule Selection Model for Syntax-based Statistical Machine Translation
One of the major challenges for statistical machine translation (SMT) is to choose the appropriate translation rules based on the sentence context. This paper proposes a continuous space rule selection (CSRS) model for syntax-based SMT to perform this context-dependent rule selection. In contrast to existing maximum entropy based rule selection (MERS) models, which use discrete representations ...
متن کاملUnsupervised training of maximum-entropy models for lexical selection in rule-based machine translation
This article presents a method of training maximum-entropy models to perform lexical selection in a rule-based machine translation system. The training method described is unsupervised; that is, it does not require any annotated corpus. The method uses source-language monolingual corpora, the machine translation (MT) system in which the models are integrated, and a statistical target-language m...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملRule Selection with Soft Syntactic Features for String-to-Tree Statistical Machine Translation
In syntax-based machine translation, rule selection is the task of choosing the correct target side of a translation rule among rules with the same source side. We define a discriminative rule selection model for systems that have syntactic annotation on the target language side (stringto-tree). This is a new and clean way to integrate soft source syntactic constraints into string-to-tree syste...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008